Search CORE

2,197 research outputs found

Probe set algorithms: is there a rational best bet?

Author: B Harr
BM Bolstad
BM Bolstad
BP Durbin
C Li
C Li
DM Rocke
Eric P Hoffman
FF Millenaar
J Freudenberg
J Seo
J Seo
Jinwook Seo
JN McClintick
M Bakay
M Inoue
P Zhao
RA Irizarry
RA Irizarry
RA Irizarry
S Huang
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

George Washington University: Health Sciences Research Commons (HSRC)

Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses

Author: Beck S
Hansen KD
Houseman AE
Irizarry RA
Jaffe AE
Koestler DC
Teschendorff AE
Zheng SC
Publication venue: NATURE PUBLISHING GROUP
Publication date: 01/03/2017
Field of study

UCL Discovery

A first principles approach to differential expression in microarray data analysis

Author: AK Gupta
M Dai
M McGee
RA Irizarry
RD Pearson
Robert A Rubin
SE Choe
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The disparate results from the methods commonly used to determine differential expression in Affymetrix microarray experiments may well result from the wide variety of probe set and probe level models employed. Here we take the approach of making the fewest assumptions about the structure of the microarray data. Specifically, we only require that, under the null hypothesis that a gene is not differentially expressed for specified conditions, for any probe position in the gene's probe set: a) the probe amplitudes are independent and identically distributed over the conditions, and b) the distributions of the replicated probe amplitudes are amenable to classical analysis of variance (ANOVA). Log-amplitudes that have been standardized within-chip meet these conditions well enough for our approach, which is to perform ANOVA across conditions for each probe position, and then take the median of the resulting (1 - p) values as a gene-level measure of differential expression. Results We applied the technique to the HGU-133A, HG-U95A, and "Golden Spike" spike-in data sets. The resulting receiver operating characteristic (ROC) curves compared favorably with other published results. This procedure is quite sensitive, so much so that it has revealed the presence of probe sets that might properly be called "unanticipated positives" rather than "false positives", because plots of these probe sets strongly suggest that they are differentially expressed. Conclusion The median ANOVA (1-p) approach presented here is a very simple methodology that does not depend on any specific probe level or probe models, and does not require any pre-processing other than within-chip standardization of probe level log amplitudes. Its performance is comparable to other published methods on the standard spike-in data sets, and has revealed the presence of new categories of probe sets that might properly be referred to as "unanticipated positives" and "unanticipated negatives" that need to be taken into account when using spiked-in data sets at "truthed" test beds.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Performance evaluation of commercial miRNA expression array platforms

Author: A Esquela-Kerscher
AJ Schetter
BM Bolstad
Deepa Eveleigh
DJ Lockhart
DP Bartel
J Liu
L Wu
LM Cope
Matthew N McCall
Michael Wilson
R Gregory
RA Irizarry
RA Irizarry
Rafael A Irizarry
S Griffiths-Jones
S Vasudevan
Sachin Sah
SL Yu
TR Hughes
Y Zhao
YH Yang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background microRNAs (miRNA) are short, endogenous transcripts that negatively regulate the expression of specific mRNA targets. The relative abundance of miRNAs is linked to function <it>in vivo </it>and miRNA expression patterns are potentially useful signatures for the development of diagnostic, prognostic and therapeutic biomarkers. Finding We compared the performance characteristics of four commercial miRNA array technologies and found that all platforms performed well in separate measures of performance. Conclusions The Ambion and Agilent platforms were more accurate, whereas the Illumina and Exiqon platforms were more specific. Furthermore, the data analysis approach had a large impact on the performance, predominantly by improving precision.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

puma: a Bioconductor package for propagating uncertainty in microarray analysis

Author: G Sanguinetti
G Smyth
Guido Sanguinetti
H Liu
L Gautier
LM Cope
Magnus Rattray
Marta Milo
ME Tipping
MJ Dunning
Neil D Lawrence
PDB Spellucci
RA Irizarry
RA Irizarry
RD Pearson
Richard D Pearson
X Liu
X Liu
X Liu
Xuejun Liu
Publication venue: BioMed Central
Publication date: 01/07/2009
Field of study

BACKGROUND: Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied. RESULTS: puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation. CONCLUSION: For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally

Crossref

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

White Rose Research Online

Pre-processing Agilent microarray data

Author: A Oshlack
AA Dombkowski
Agilent
AR Dabney
BA Rosenzweig
David Berman
Edward Schaeffer
G Delenstarr
G Smyth
G Smyth
G Smyth
GC Tseng
Giovanni Parmigiani
GK Smyth
I Lonnstedt
J Freudenberg
J Quackenbush
K Dobbin
KK Dobbin
Leslie Cope
LM Cope
LX Qin
Marianna Zahurak
ML Martin-Magniette
R Development Core Team
R Scharpf
RA Irizarry
RA Irizarry
RA Irizarry
Robert B Scharpf
S Dudoit
SE Choe
Shabana Shabbeer
T Bammler
W Tong
Wayne Yu
Y Yang
YH Yang
YH Yang
Publication venue: BioMed Central
Publication date: 01/05/2007
Field of study

Abstract Background Pre-processing methods for two-sample long oligonucleotide arrays, specifically the Agilent technology, have not been extensively studied. The goal of this study is to quantify some of the sources of error that affect measurement of expression using Agilent arrays and to compare Agilent's Feature Extraction software with pre-processing methods that have become the standard for normalization of cDNA arrays. These include log transformation followed by loess normalization with or without background subtraction and often a between array scale normalization procedure. The larger goal is to define best study design and pre-processing practices for Agilent arrays, and we offer some suggestions. Results Simple loess normalization without background subtraction produced the lowest variability. However, without background subtraction, fold changes were biased towards zero, particularly at low intensities. ROC analysis of a spike-in experiment showed that differentially expressed genes are most reliably detected when background is not subtracted. Loess normalization and no background subtraction yielded an AUC of 99.7% compared with 88.8% for Agilent processed fold changes. All methods performed well when error was taken into account by t- or z-statistics, AUCs ≥ 99.8%. A substantial proportion of genes showed dye effects, 43% (99%<it>CI </it>: 39%, 47%). However, these effects were generally small regardless of the pre-processing method. Conclusion Simple loess normalization without background subtraction resulted in low variance fold changes that more reliably ranked gene expression than the other methods. While t-statistics and other measures that take variation into account, including Agilent's z-statistic, can also be used to reliably select differentially expressed genes, fold changes are a standard measure of differential expression for exploratory work, cross platform comparison, and biological interpretation and can not be entirely replaced. Although dye effects are small for most genes, many array features are affected. Therefore, an experimental design that incorporates dye swaps or a common reference could be valuable.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods

Author: A Hess
AA Fodor
AR Dabney
C Li
DB Allison
DP Gaile
E Hubbell
E Schuster
F Leisch
G Smyth
L Shi
LM Cope
P Baldi
RA Irizarry
RA Irizarry
RC Gentleman
Richard D Pearson
S Hochreiter
S Lemieux
SE Choe
T Sing
VG Tusher
X Liu
X Liu
Z Chen
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The Golden Spike data set has been used to validate a number of methods for summarizing Affymetrix data sets, sometimes with seemingly contradictory results. Much less use has been made of this data set to evaluate differential expression methods. It has been suggested that this data set should not be used for method comparison due to a number of inherent flaws. Results We have used this data set in a comparison of methods which is far more extensive than any previous study. We outline six stages in the analysis pipeline where decisions need to be made, and show how the results of these decisions can lead to the apparently contradictory results previously found. We also show that, while flawed, this data set is still a useful tool for method comparison, particularly for identifying combinations of summarization and differential expression methods that are unlikely to perform well on real data sets. We describe a new benchmark, AffyDEComp, that can be used for such a comparison. Conclusion We conclude with recommendations for preferred Affymetrix analysis tools, and for the development of future spike-in data sets.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments

Author: A Lee
A Mortazavi
A Oshlack
B Ewing
B Langmead
DR Bentley
DY Chiang
Elizabeth Purdom
ET Wang
H Li
Illumina
Illumina
J Lu
James H Bullard
JC Dohm
JC Marioni
Kasper D Hansen
MA Taub
MAQC Consortium
MD Robinson
PAC Hoen
RA Irizarry
RA Irizarry
RD Canales
S Durinck
Sandrine Dudoit
U Nagalakshmi
Publication venue: BioMed Central
Publication date: 21/04/2009
Field of study

Abstract Background High-throughput sequencing technologies, such as the Illumina Genome Analyzer, are powerful new tools for investigating a wide range of biological and medical questions. Statistical and computational methods are key for drawing meaningful and accurate conclusions from the massive and complex datasets generated by the sequencers. We provide a detailed evaluation of statistical methods for normalization and differential expression (DE) analysis of Illumina transcriptome sequencing (mRNA-Seq) data. Results We compare statistical methods for detecting genes that are significantly DE between two types of biological samples and find that there are substantial differences in how the test statistics handle low-count genes. We evaluate how DE results are affected by features of the sequencing platform, such as, varying gene lengths, base-calling calibration method (with and without phi X control lane), and flow-cell/library preparation effects. We investigate the impact of the read count normalization method on DE results and show that the standard approach of scaling by total lane counts (e.g., RPKM) can bias estimates of DE. We propose more general quantile-based normalization procedures and demonstrate an improvement in DE detection. Conclusions Our results have significant practical and methodological implications for the design and analysis of mRNA-Seq experiments. They highlight the importance of appropriate statistical methods for normalization and DE inference, to account for features of the sequencing platform that could impact the accuracy of results. They also reveal the need for further research in the development of statistical and computational methods for mRNA-Seq.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Collection Of Biostatistics Research Archive

Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution

Author: A Meissner
A Meissner
AL Brunner
Alexander Meissner
Andreas Gnirke
C Bock
C Bock
C Bock
Christoph Bock
DJ Weisenberger
DM Hellebrekers
Eleni Tomazou
Eric S Lander
F Eckhardt
Hongcang Gu
JD Storey
M Bibikova
M Esteller
Natalie Jäger
R Lister
R Lister
RA Irizarry
RA Irizarry
TA Down
Tarjei S Mikkelsen
W Zhang
WD Garrison
Zachary D Smith
ZD Smith
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2009
Field of study

August 1, 2010Bisulfite sequencing measures absolute levels of DNA methylation at single-nucleotide resolution, providing a robust platform for molecular diagnostics. Here, we optimize bisulfite sequencing for genome-scale analysis of clinical samples. Specifically, we outline how restriction digestion targets bisulfite sequencing to hotspots of epigenetic regulation; we show that 30ng of DNA are sufficient for genome-scale analysis; we demonstrate that our protocol works well on formalinfixed, paraffin-embedded (FFPE) samples; and we describe a statistical method for assessing significance of altered DNA methylation patterns.National Institutes of Health (U.S.) (Grant R01HG004401)National Institutes of Health (U.S.) (Grant U54HG03067)National Institutes of Health (U.S.) (Grant U01ES017155

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central

MPG.PuRe

Determining gene expression on a single pair of microarrays

Author: A Provenzani
AA Fodor
AHI Hess
AM Hein
Anthony A Fodor
B Bolstad
BM Bolstad
C Cheng
C De Mees
CWWI Li
DB Allison
E Turro
L Klebanov
M Milo
MM Ryan
P Baldi
P Sommer
RA Irizarry
RA Irizarry
Robert W Reid
S Holm
S Pounds
WJ Lemon
X Liu
YHY Benjamini
YYD Benjamini
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background In microarray experiments the numbers of replicates are often limited due to factors such as cost, availability of sample or poor hybridization. There are currently few choices for the analysis of a pair of microarrays where N = 1 in each condition. In this paper, we demonstrate the effectiveness of a new algorithm called PINC (PINC is Not Cyber-T) that can analyze Affymetrix microarray experiments. Results PINC treats each pair of probes within a probeset as an independent measure of gene expression using the Bayesian framework of the Cyber-T algorithm and then assigns a corrected p-value for each gene comparison. The p-values generated by PINC accurately control False Discovery rate on Affymetrix control data sets, but are small enough that family-wise error rates (such as the Holm's step down method) can be used as a conservative alternative to false discovery rate with little loss of sensitivity on control data sets. Conclusion PINC outperforms previously published methods for determining differentially expressed genes when comparing Affymetrix microarrays with N = 1 in each condition. When applied to biological samples, PINC can be used to assess the degree of variability observed among biological replicates in addition to analyzing isolated pairs of microarrays.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central